skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Matuszek, Cynthia"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. To support positive, ethical human-robot interactions, robots need to be able to respond to unexpected situations in which societal norms are violated, including rejecting unethical commands. Implementing robust communication for robots is inherently difficult due to the variability of context in real-world settings and the risks of unintended influence during robots’ communication. HRI researchers have begun exploring the potential use of LLMs as a solution for language-based communication, which will require an in-depth understanding and evaluation of LLM applications in different contexts. In this work, we explore how an existing LLM responds to and reasons about a set of norm-violating requests in HRI contexts. We ask human participants to assess the performance of a hypothetical GPT-4-based robot on moral reasoning and explanatory language selection as it compares to human intuitions. Our findings suggest that while GPT-4 performs well at identifying norm violation requests and suggesting non-compliant responses, its flaws in not matching the linguistic preferences and context sensitivity of humans prevent it from being a comprehensive solution for moral communication between humans and robots. Based on our results, we provide a four-point recommendation for the community in incorporating LLMs into HRI systems. 
    more » « less
    Free, publicly-accessible full text available November 24, 2025
  2. Large Language Models (LLMs) are pre-trained on large-scale corpora and excel in numerous general natural language processing (NLP) tasks, such as question answering (QA). Despite their advanced language capabilities, when it comes to domain-specific and knowledge-intensive tasks, LLMs suffer from hallucinations, knowledge cut-offs, and lack of knowledge attributions. Additionally, fine tuning LLMs' intrinsic knowledge to highly specific domains is an expensive and time consuming process. The retrieval-augmented generation (RAG) process has recently emerged as a method capable of optimization of LLM responses, by referencing them to a predetermined ontology. It was shown that using a Knowledge Graph (KG) ontology for RAG improves the QA accuracy, by taking into account relevant sub-graphs that preserve the information in a structured manner. In this paper, we introduce SMART-SLIC, a highly domain-specific LLM framework, that integrates RAG with KG and a vector store (VS) that store factual domain specific information. Importantly, to avoid hallucinations in the KG, we build these highly domain-specific KGs and VSs without the use of LLMs, but via NLP, data mining, and nonnegative tensor factorization with automatic model selection. Pairing our RAG with a domain-specific: (i) KG (containing structured information), and (ii) VS (containing unstructured information) enables the development of domain-specific chat-bots that attribute the source of information, mitigate hallucinations, lessen the need for fine-tuning, and excel in highly domain-specific question answering tasks. We pair SMART-SLIC with chain-of-thought prompting agents. The framework is designed to be generalizable to adapt to any specific or specialized domain. In this paper, we demonstrate the question answering capabilities of our framework on a corpus of scientific publications on malware analysis and anomaly detection. 
    more » « less
    Free, publicly-accessible full text available December 18, 2025
  3. Virtual reality is progressively more widely used to support embodied AI agents, such as robots, which frequently engage in ‘sim-to-real’ based learning approaches. At the same time, tools such as large vision-and-language models offer new capabilities that tie into a wide variety of tasks and capabilities. In order to understand how such agents can learn from simulated environments, we explore a language model’s ability to recover the type of object represented by a photorealistic 3D model as a function of the 3D perspective from which the model is viewed. We used photogrammetry to create 3D models of commonplace objects and rendered 2D images of these models from an fixed set of 420 virtual camera perspectives. A well-studied image and language model (CLIP) was used to generate text (i.e., prompts) corresponding to these images. Using multiple instances of various object classes, we studied which camera perspectives were most likely to return accurate text categorizations for each class of object. 
    more » « less
  4. The proliferation of Large Language Models (LLMs) presents both a critical design challenge and a remarkable opportunity for the field of Human-Robot Interaction (HRI). While the direct deployment of LLMs on interactive robots may be unsuitable for reasons of ethics, safety, and control, LLMs might nevertheless provide a promising baseline technique for many elements of HRI. Specifically, in this position paper, we argue for the use of LLMs as Scarecrows: ‘brainless,’ straw-man black-box modules integrated into robot architectures for the purpose of quickly enabling full-pipeline solutions, much like the use of “Wizard of Oz” (WoZ) and other human-in-the-loop approaches. We explicitly acknowledge that these Scarecrows, rather than providing a satisfying or scientifically complete solution, incorporate a form of the wisdom of the crowd, and, in at least some cases, will ultimately need to be replaced or supplemented by a robust and theoretically motivated solution. We provide examples of how Scarecrows could be used in language-capable robot architectures as useful placeholders, and suggest initial reporting guidelines for authors, mirroring existing guidelines for the use and reporting of WoZ techniques. 
    more » « less
  5. Over the past decade, the machine learning security community has developed a myriad of defenses for evasion attacks. An under- studied question in that community is: for whom do these defenses defend? This work considers common approaches to defending learned systems and how security defenses result in performance inequities across different sub-populations. We outline appropriate parity metrics for analysis and begin to answer this question through empirical results of the fairness implications of machine learning security methods. We find that many methods that have been proposed can cause direct harm, like false rejection and unequal benefits from robustness training. The framework we propose for measuring defense equality can be applied to robustly trained models, preprocessing-based defenses, and rejection methods. We identify a set of datasets with a user-centered application and a reasonable computational cost suitable for case studies in measuring the equality of defenses. In our case study of speech command recognition, we show how such adversarial training and augmentation have non-equal but complex protections for social subgroups across gender, accent, and age in relation to user coverage. We present a comparison of equality between two rejection-based de- fenses: randomized smoothing and neural rejection, finding randomized smoothing more equitable due to the sampling mechanism for minority groups. This represents the first work examining the disparity in the adversarial robustness in the speech domain and the fairness evaluation of rejection-based defenses. 
    more » « less
  6. Intuitively, students who start a homework assignment earlier and spend more time on it should receive better grades on the assignment. However, existing literature on the impact of time spent on homework is not clear-cut and comes mostly from K-12 education. It is not clear that these prior studies can inform coursework in deep learning due to differences in demographics, as well as the computational time needed for assignments to be completed. We study this problem in a post-hoc study of three semesters of a deep learning course at the University of Maryland, Baltimore County (UMBC), and develop a hierarchical Bayesian model to help make principled conclusions about the impact on student success given an approximate measure of the total time spent on the homework, and how early they submitted the assignment. Our results show that both submitting early and spending more time positively relate with final grade. Surprisingly, the value of an additional day of work is apparently equal across students, even when some require less total time to complete an assignment. 
    more » « less
  7. Human-robot interaction is a critical area of research, providing support for collaborative tasks where a human instructs a robot to interact with and manipulate objects in an environment. However, an under-explored element of these collaborative manipulation tasks are small-scale building exercises, in which the human and robot are working together in close proximity with the same set of objects. Under these conditions, it is essential to ensure the human’s safety and mitigate comfort risks during the interaction. As there is danger in exposing humans to untested robots, a safe and controlled environment is required. Simulation and virtual reality (VR) for HRI have shown themselves to be suitable tools for creating space for human-robot experimentation that can be beneficial in these scenarios. However, the use of simulation and VR comes with the possibility of failures resulting from the sim-to-real gap, where the behavior of the simulated robot may not accurately reflect the experience of a human collaborator in a real-world setting. This gap can limit the generalizability of research findings and raise questions about the validity of using simulation and VR for HRI research. Our goal in this work is to demonstrate the effectiveness of sim-to-real approaches for contact-based human-robot interaction. 
    more » « less
  8. Our study is motivated by robotics, where when dealing with robots or other physical systems, we often need to balance competing concerns of relying on complex, multimodal data coming from a variety of sensors with a general lack of large representative datasets. Despite the complexity of modern robotic platforms and the need for multimodal interaction, there has been little research on integrating more than two modalities in a low data regime with the real-world constraint that sensors fail due to obstructions or adverse conditions. In this work, we consider a case in which natural language is used as a retrieval query against objects, represented across multiple modalities, in a physical environment. We introduce extended multimodal alignment (EMMA), a method that learns to select the appropriate object while jointly refining modality-specific embeddings through a geometric (distance-based) loss. In contrast to prior work, our approach is able to incorporate an arbitrary number of views (modalities) of a particular piece of data. We demonstrate the efficacy of our model on a grounded language object retrieval scenario. We show that our model outperforms state-of-the-art baselines when little training data is available. Our code is available at https://github.com/kasraprime/EMMA 
    more » « less
  9. The overarching goal of this work is to enable the collection of language describing a wide variety of objects viewed in virtual reality. We aim to create full 3D models from a small number of ‘keyframe’ images of objects found in the publicly available Grounded Language Dataset (GoLD) using photogrammetry. We will then collect linguistic descriptions by placing our models in virtual reality and having volunteers describe them. To evaluate the impact of virtual reality immersion on linguistic descriptions of the objects, we intend to apply contrastive learning to perform grounded language learning, then compare the descriptions collected from images (in GoLD) versus our models. 
    more » « less